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@) Digital electronic multiplier circuits. 

@ A multiplier formed as a single integrated circuit chip 
generates in consecutive clock cycles the single-precision 
partial products of multiple-precision operands. Provision of an 
on-chip temporary register and «wrap-back» path avoids 
transmitting and externally storing intermediate results so that 
no clock cycles are used solely for data-transfer or other 
**. overhead ». Consecutive double-precision multiplications can 
be performed concurrently so that complete quadruple-pre- 
cision products are generated every four cycles. 
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DIGITAL ELECTRONIC MULTIPLIER CIRCUITS 

This invention relates to digital electronic 
multiplier circuits, and more particularly, to a 
monolithic integrated circuit multiplier, having 
an on-chip pipeline register and an on-chip wrap-back 
5 data path, capable of generating at each clock cycle 
a single-precision portion of the extended-precision 
product of the multiple-precision operands. 

Typically, arithmetic operations are performed by an 
integrated circuit arithmetic-logic unit (ALU) having only 
10 single-precision capability. Multiple-precision operations are 
performed by having the ALU repeatedly perform single-precision 
operations on suitably scaled single-precision operands. 
Temporary storage of the partial results is provided by registers 
external to the ALU chip. Scaling operations are also typically 
-15 performed by external circuitry. Such off-chip data-transfers 
and the limited, single-precision capability of the ALU require a 
large number of ALU clock cycles to perform an extended-precision 
multiplication involving double-precision operands. 

In addition to slowness, provision in the prior art of 
2Q external registers and off-chip data paths requires a large 
number of chips which must be interconnected during board 
manufacture. This results in larger design and assembly costs, 
lower reliability, and larger space requirements, than would be 
the case for a single chip capable of performing 
extended-precision multiplication. 
2 ^ Interest, in multiple-precision arithmetic operations has 

recently been intensified by the need to efficiently perform 
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(P754) for 53-bit mantissa double-precision numbers. 
However, even these important developments do not 
warrant the use of dedicated central processing units 
C CPUs ) to perform the extended-precision operations. 
5 Accordingly, there is an unmet need for a cost- 
effective, efficient extended-precision multiplier. 

In the present specification, there is described 
a monolithic integrated circuit extended-precision 
multiplier capable of generating at each clock cycle a 
10 single-precision partial-product of multiple-precision 
operands. An on-chip pipeline (temporary-result) 
register and an on-chip "wrap-back" path avoids the 
need for external storage of temporary results or for 
any off-chip intermediate data transfers. The self- 
15 contained multiplier circuit can generate in four 
consecutive clock cycles the four single-precision 
words comprising the complete product of two double- 
precision operands. Provision of the temporary 
register allows initiation of a second multiple- 
20 precision multiplication concurrently with the 

processing of the present multiplication so that a 
useful result is generated at every clock cycle of 
the multiplier with no clock cycles being used solely 
for data-transfers or other "overhead" operations . 
25 Accordingly, the multiplier wastes no clock cycles 
and realises a very high throughput. 

In one illustrative embodiment, a multiplier 
with a clock cycle of 50 nanoseconds capable of 
processing 64-bit double-precision operands is provided. 
30 Thus a 32-bit single-precision partial product word is 
generated every 50 nanoseconds and the entire 128-bit 
product is generated in 200 nanoseconds. A 32- x 32-bit 
multiplier array operates on the 32 most-significant 
bit portion and the 32 least-significant bit portion 
35 of the operands which are stored in on-chip 32-bit 
registers . 
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The least-significant single-precision product word is first 
formed and stored in the temporary register to provide a two 
clock-cycle pipeline delay so that the product words remaining 
from the preceding multiplication can be generated before 

5 generation of the product words for the current multiplication. 
The single-precision "cross-product" words are formed next and 
subjected to a single deck cycle pipeline delay and are 
generated following the least-significant product word. The 
most-significant single-precision product word is last formed 

^ q and stored in the temporary register to provide a two clock-cycle 
pipeline delay while the "cress-product" words are being 
generated and is last generated during the seventh clock cycle 
following application of the least-significant portions of the 
operands. However, since these operands were applied 
concurrently with the generation of the first "cross-product" 

1 5 

word from the preceding multiplication, a partial product word is 
generated at every clock cycle with no interruptions. 

Multiplexers are provided on tie multiplicand and multiplier 
data paths as well as on the product data path so that the 
multiple-precision operands and product may be transferred onto 
20 and off the chip in 32-bit words. 

In the accompanying drawings, by way of example 
only, the sole figure is a functional block diagram 
of an extended-precision parallel multiplier embodying 
the invention. 

2 5 A 32- x 32-bit parallel multiplier 10 is shown 

in the drawing. A set of thirty-two input terminals 
DATA_IN_X Q ^^ 1 receives a set of signals representing 
either the least-significant or most-significant portion 
of the 64-bit multiplicand word which are conducted 

30 to an XA register (XA__REG ) 102 via a 32-conductor 
data bus 104 and to an XB register (XB_REG) 106 via 
a 32-bit conductor data bus 108. (For convenient 
notation, in the drawings there are many data paths 
near which there are numbers enclosed by parentheses. 

35 These enclosed numbers indicate the width of the adjacent 
data path or the number of signals which can be trans- 
ferred in parallel on the 
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data path. The IPC and various other control blocks, while not 
shov/n in Fig. 1, nor is their design and operation described in 
detail herein, are well-known to those skilled in the 
art.) XA_REG 102 and XB_REG 106 xeceive a control signal from an 
5 input control (IPC), not shown, which causes the thirty-two 
signals applied thereto to be stored in the register. In this 
manner, the 32 least-significant bits of the multiplicand 
word are applied first to the DATA_IN_Xo-3i terminals of 
multiplier 10 and stored in the XA_REG 102 and then the 

10 32-iaost-signif icant bits of the multiplicand word are applied to 
the DATA_INJCo-3i and stored in the XBJ?EG 106. 

An X multiplexer (X_HUX) 110 receives the 32-bit contents of 
the XA_REG 102 via a 32^conductor data bus 112, the 32-bit 
contents of the XB_REG 106 via a 32-conductor data bus 114, and 

15 the signals applied to the DATA_IN_Xo-3i terminals via a 

32-conductor data bus 116. The latter being an alternative 
"feed-through" data path which can be selected by the control 
signal received by the XJ1UX 110. The X_HUX 110 receives a 
control signal from the IPC which causes one of the thirty-two 

20 signals conducted on data bus 112, 114 or 116 to be generated on 
a 32-conductor data bus 118 to a 32- x 32-bit multiplier array 
120. 

A set of 32-conductor data busses 134, 138, 142, 144, 146 
and 148, multiplexer YJIUX 140, and registers YA_REG 132 and 

25 YB_REG 136 is provided to store and transmit the 

least^ignificant and most-significant portion of the 64-bit 
multiplier word applied at a set of thirty-two input terminals 
DATA_IN_Yo-3i to the 32- x 32-bit parallel multiplier 10. These 
elements, as illustrated in Fig. 1, are interconnected and 

30 operate in the same manner as the busses, multiplexers and 
registers described above in connection with the reception, 
storage and transmission of the 64-bit multiplicand word. In 
this manner the least- or most-significant 32-bit portion of the 
multiplier word can be conducted via bus 148 to the 32- x 32-bit 

35 multiplier array 120. 

The 32- x 32-bit multiplier array 120 is the subject of 
related, co-pending application filed con- 

currently herewith. 
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The multiplier array 120 generates on a 64-conductor 
data bus 150 the 64-bit partial product word of the 
32-bit multiplicand word present on the data bus 118 
and the 32-bit multiplier word present on the data 
5 bus 1H8. 

A 67-bit partial product adder 152 receives at a first input 
the 64-bit product via the data bus 150. A 67-bit word is 
conducted to a second input of adder 152 via a 67-conductor 
data-bus 154 from a multiplexer (MUX) 156. I1UX 156 receives a 
10 control signal from a control block, not shown, which causes the 
tlUX 156 to select either the 67-bit word conducted on an internal 
"wrap-back" path via a 67-conductor bus 158 or the 67-bit word 
conducted on the wrap-back path right-shifted by thirty-two bit 
positions by a 32-bit shifter 160 and conducted to HUX 156 via 
bus 162 with thirty-four leading sign-extended bits appended. 

1 5 

Alternatively, a 67-bit word consisting of all ZERO'S may be 
selected by the control signal to cause HUX 156 to apply an 
all-ZERO word to the second input of adder 152 via bus 154. 

In this manner, the 67-bit partial-product adder 152 
generates on a 67-conductor data bus 164 a 67-bit partial product 

20 word formed from the sum of the 64-bit product word generated by 
the multiplier array 120 and the contents of a 67-bit product 
register (P_REG) 166 which receives and stores the most-recently 
generated partial product via data bus 164, stores the same, and 
generates on the wrap-back path data bus 158 the 67-bit word 

25 stored therein. The 67-bit contents of the P-register 166 may be 
down-scaled by a factor of 2" by the 32-bit shifter 160, as 
required, before addition to the 64-bit product word as will be 
described below in connection with Table II. 

The 67-bit contents of the P-register 166 is also conducted " 

30 to a first input of an output multiplexer (OUTJIUX) 168 via a 
67-conductor data bus 170. The contents of the least-significant 
32 bit positions and the next most-significant 32 bit positions 
of the P_REG 166 are conducted to a first and a second set of 
inputs, respectively, of a temporary multiplexer (TJ1UX) 172 via 

35 a 32-conductor data bus 173a and a 32-conductor data bus 173b, 
respectively. A 32-bit temporary register (T_REG) 174 connected 
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to the TJHUX 172 via a 32-conductor data bus 175 receives either 
the 32 least-significant bit contents of the P-register 166 or 
the 32 next roost-significant bit contents of the P-register 166, 
respectively, in response to T select (T_SEL) and format adjust 
(FA) signals applied to input terminals of multiplier 10 and 
conducted to the TJ1UX 172, in accordance with Table I, 
below. The 32-bit contents of the T-register 174 are conducted 
via a 32-conductor data bus 176 to a second input of the 0UTJ1UX 
168. At a third input of the OUTJtUX 168, the 67-bit partial 
product word generated by the adder 152 is applied via a 
67-conductor bus 178. 

TABLE I 

Snnrre Selection for the ith Bit P osition of T REG 174 



T SEL 


FA 


Soutm (via T ffiJX 172) 


LOW 


LOW 


Bit Position i-1 of P_REG 166 


LOW 


HIGH 


Bit Position i of P_REG 166 


HIGH 


LOW 


Bit Position i+31 of P_REG 166 


HIGH 


HIGH 


Bit Position i+32 of P_REG 166 


T_REG 


174 also 


receives an enable signal (ENT) applied 



to an input terminal of multiplier 10 which causes the contents 
of the T_REG to be generated on 32-bit conductor 176 during the 
next-following clock cycle. A result temporarily stored in T_REG 
176 may be accordingly delayed by a selectable number of clock 
cycles . 

In response to control signals generated by an output 
control (OPC), not shown, the 0UTJ1UX 168 causes the product of 
the 64-bit multiplicand and multiplier words applied to 
multiplier 10 to be generated at a set of thirty-two PR0D_0UT©-;*i 
terminals of multiplier 10, with the assistance of T-register 
174, as will be described below in connection with Table II. 

The 32- x 32-bit parallel multiplier 10 of the instant 
invention can perform a multiplication of two numbers represented 
by 64-bit words by sequentially applying first the 
least-significant, and then the most-significant 32-bit portions 
of the multiplicand, multiplier, respectively, to the 
DATA_IN_Xo-3i , DATAJ[fLYo-3i terminals, respectively of the 
multiplier 10. The XA_REG 102, respectively, the YA_REG 132, and 
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the XB_REG 106, respectively, the YB_REG 136, will then store the 
least-significant and the most-significant, 32-bit portions of 
the multiplicand and multiplier words, XWO and XW1, YWO and YW1, 
respectively. A 128-bit product will subsequently be generated 
5 by the multiplier 10, represented as four 32-bit product words, 
PW3, PW2, PW1 and PWQ; being respectively the most-significant 
32-bit portion PW3, the next most-significant 32-bit portion 
PW2, the next to least-significant 32-bit portion PWl, and the 
least-significant 32-bit portion PWQ. 

10 The resulting product is sequentially generated at the 

PR0D_0UTo-3i terminals of the multiplier 10 so that the product 
word PW0 is first generated, followed by the product words PWl, 
PW2 and PW3. The full 128-bit product (PROD) being related to 
the four product words PW0, PWl, PW2 and PW3 by the equation: 

15 PROD = (PW3 * 2**) + (PW2 * 2") + (PWl * 2") + PW0. 

The value of the signal applied to the terminals, on the 
data bus, and the contents of the various registers within 
multiplier 10 during the extended-precision multiplication of two 
64-bit numbers is best described with reference to Table II, 

20 below. One complete extended-precision multiplication is shown 
within the dashed lines in Table II beginning with the zeroth 
cycle of an external clock supplying synchronization signals to 
multiplier 10, and continuing through the sixth cycle of the 
clock. The value for the register, bus or terminal whose label 

25 appears in the rows of Table II is shown by the entry in the 
column corresponding to the clock cycle whose label appears 
at the top of the column. Those entries falling to the left of 
the left-most dashed line of Table II refer to the values 
remaining from the preceding extended-precision multiplication 

30 while those falling to the right of the right-most dashed line 
refer to the values pertaining to the next-following 
extended-precision multiplication. The purpose in showing these 
preceding and next-following values in Table II is to exhibit the 
manner in which a full 128-bit product, PROD, consisting of the 

35 four 32-bit product words PW0, PWl, PW2 and PW3, can be generated 
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in four consecutive clock cycles by the multiplier 10 of the 
instant invention by the application of the multiplicand and 
multiplier words at a clock cycle during which the previous 
product word PHI is currently being generated by the multiplier 
5 10. In this way, by appropriately "pipelining" the input 

operands, the multiplier 10 of the present invention can generate 
a complete extended-precision product every four cycles of the 
external clock. 

Table II 

10 64- x 64-Bit Flultiplication Register, Bus, an d Terminal Values 

QlPPk CyclQ 

Register, Bus 



15 



20 



Terminals 


t-S 3 


2 


3 


DATA_IN_X 


: xwo xwi 






XA_REG 102 


xwb ^ xwo 


XWO 


XWO 


XB_REG 106 


xwi " ~xwl " T 


XWI 


XWI 


DATA_IN_Y 


1 YW0 YWl 






YA_REG 132 


~YW0~~| YW0 


YW0 


YW0 


YB.REG 136 


" YWl YW~l~~j 


YWl 


YWl 


DATA BUS 150 


XW1*YW1 '.1W0*YW0 

• 


XW1*YW0 


XW0*YW1 


P_REG 166 


S/A ' PASS_ 


S/A 


ACC 


T_REG 174 




PW0 




PR0D_OUT 


PW1 PW2 


PW3 : 

• 


PW0 



r— * § 


6 


L 2™°_ „ Xtfl 




" xwo" ~ : xwo 


XWO 


XWI XWI i 


XWI 


I YWO YWl 




w ywo~~: ywo 


YWO 


YWl ~~YWf ~j 


YWl 


XW1*YW1 j XW0*YW0 * 


XW1*YW0 


S/A ;_ PASS_ 


S/A 


~ PW3 ~ ~\ 


PW0 


pwi pw2 pw3 : 



As shown in Table II, signals representing the 
25 least-significant portion of the multiplicand word, XWO, and the 
multiplier word, YWO, are applied to the DATA_IN_Xo-3i and 
DATA_IN_Yo-3i terminals, respectively, of multiplier 10 at clock 
cycle 0. At the next clock cycle, 1, the XA_REG 102, the YAJIEG 
132, respectively, receive control signals causing these applied 
30 data signals to be stored in XAJREG 102 and YAJiEG 132, 

respectively, as shown in rows 2 and 5, respectively, of Table 
II. Also at clock cycle 1, signals representing the 
most-significant portion of the multiplicand word, XWI, and the 
multiplier word, YWl, are applied to the DATA_INJCo-3i and 
35 the DATA_INLYo-3t terminals, respectively, of multiplier 10. 
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At the next clock cycle, 2, the XB_REG 106, the YB_REG 136, 
respectively, receive control signals causing these applied data 
signals to be stored in the XB_REG and the YB_REG, respectively, 
as shown in rows 3 and 6, respectively, of Table II. 
5 During clock cycle 1, the contents of XA_REG 102 and YAJiEG 

132 are conducted via XJ1UX 110 and Y_MUX 140, respectively, 
which have received control signals causing the signals 
corresponding to the 32-bit words XW0 and YWO, respectively, to 
be conducted on data busses 118 and 148, respectively, to the 

10 32- x 32-bit multiplier array 120. As shown in row 7 of Table 
II, labelled M DATA BUS 150", the multiplier array 120 performs a 
multiplication of the operands XW0 and YWO. The resulting 64-bit 
product XW0*YW0 is conducted via data bus 150 to the first input 
of the 67-bit adder 152. fflJX 156 receives control signals 

15 causing the ZERO input signals applied thereto to be routed to 
the second input of the 67-bit adder 152. Control signals 
received by adder 152 cause the 64-bit product XW0*YW0 to PASS 
unchanged to the 67-bit P_REG 166, as shown in the 8th row of 
Table II, labelled M P_REG 166". 

20 During clock cycle 2, the contents of XB_REG 106 and YA_REG 

132 are conducted via XJIUX 110 and Y_I1UX 140, respectively, 
which have received control signals causing the signals 
corresponding to the 32-bit words XW1 and YWO, respectively, to 
be conducted on data busses 118 and 148, respectively, to the 

25 32- x 32-bit multiplier array 120; As shown in row 7 of Table 

II, labelled "DATA BUS 150", the multiplier array 120 performs a 
multiplication of the operands XW1 and YWO. The resulting 64-bit 
product XW1*YW0 is conducted via data bus 150 to the first input 
of the 67-bit adder 152. 

30 Thirty-two bit shifter 160 receives control signals causing 

thei 67-bit contents of the P_REG 166 conducted thereto on the 
wrap-back bus 158 to be right-shifted by 32-bit positions (i.e., 
the previous product XW0*YW0 is divided by 2") and the results 
applied to KUX 156. FIUX 156 receives control signals causing the 

35 signals corresponding to the resulting scaled producted to be 
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routed to the second input of the 67-bit adder 152. Control 
signals received by adder 152 cause the 64-bit product XW1*YVJQ to 
be added to the scaled previous product at the second input of 
adder 152 and the sum stored in the P_REG 166. These operations 
5 are shown in the 8th row of Table II by the entry "S/A" standing 
for shift of the previous product and add to current product. 

Also during clock cycle 2, the 32 least-significant bit 
contents (prior to replacement by the just-mentioned sum) of the 
67-bit P_REG 166 is applied to the 32-bit T_REG 174, as shown in 

10 the 9th row of Table II. This causes the least-significant 

product word PWO to be stored temporarily in T_ REG 174, since the 
results of the preceding 64- x 64-bit multiplication has not be 
completely generated at the PR0D_0UTo-3i terminals of multiplier 
10 as shown by the PW3 entry in the PR0D_0UT row of Table II for 

15 clock cycle 2. 

During clock cycle 3, the contents of XAJREG 102 and YB_REG 
136 are conducted via XJ1UX 110 and YJSUX 140, respectively, 
which have received control signals causing the signals 
corresponding to the 32-bit words XW0 and YW1, respectively, to 

20 be conducted on data busses 118 and 148, respectively, to the 
32- x 32-bit multiplier array 120. As shown in row 7 of Table 
II, labelled "DATA BUS 150", the multiplier array 120 performs a 
multiplication of the operands XW0 and YW1. The resulting 64-bit 
product XW0*YW1 is conducted via data bus 150 to the first input 

25 of the 67-bit adder 152. 

The HUX 156 receives control signals causing the 67-bit 
contents of the P_REG 166 XW1*YW0 conducted thereto on the 
wrap-back bus 158 to be routed to the second input of the 67-bit 
adder 152. Control signals received by adder 152 cause the 

30 64-bit product XW0*YW1 to be added to the previous product at the' 
second input of adder 152 and the sum stored in the P_REG 166. 
These operations are shown in the 8th row of Table II by the 
entry "ACC" standing for add the previous product ttt current 
product . 

35 Also during clock cycle 3, the least-significant product 
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word PWO stored temporarily in T_REG 174, to be conducted to 
OUTJTUX 168 and control signals received thereby causing the 
OUTJTUX 168 to route signals corresponding to PWO to the 
PR0D_0UTo-3i terminals of multiplier 10 as shown by the PWO 
5 entry in the PR0D_GUT row of Table II for clock cycle 3. 

During clock cycle 4, the contents of XB_REG 106 and YB_REG 
136 are conducted via XJTUX 110 and YJTUX 140, respectively, 
which have received control signals causing the signals 
corresponding to the 32-bit words XW1 and YW1, respectively, to 

10 be conducted on data busses 118 and 148, respectively, to the 
32- x 32-bit multiplier array 120. As shown in row 7 of Table 
II, labelled "DATA BUS 150", the multiplier array 120 performs a 
multiplication of the operands XW1 and YW1. The resulting 64-bit 
product XW1*YW1 is conducted via data bus 150 to the first input 

15 of the 67-bit adder 152. 

Thirty-two bit shifter 160 receives control signals causing 
the 67-bit contents of the P_REG 166 XW0*YW1 conducted thereto on 
the wrap-back bus 158 to be right -shifted by 32-bit positions 
(i.e., the previous accumulated product is divided by 2") and 

20 the results applied to ITUX 156. MUX 156 receives control signals 
causing the signals corresponding to the resulting scaled 
producted to be routed to the second input of the 67-bit adder 
152. Control signals received by adder 152 cause the 64-bit 
product XW1*YW1 to be added to the scaled previous product at the 

25 second input of adder 152 and the sum stored in the PJiEG 166. 
These operations are shown in the 8th row of Table II by the 
entry "S/A" standing for shift of the previous product and §dd to 
current product. 

Also during clock cycle 4, the 67-bit contents of the 67-bit 

30 P_REG 166 to be conducted to OUTJTUX 168 and control signals 
received thereby causing the OUTJTUX 168 to route signals 
corresponding to the 32 least-significant bits thereof 
corresponding to PW1 to the PROD.OUTo-si terminals of multiplier 
10 as shown by the PW1 entry in the PRODJDUT row of Table II for 

35 clock cycle 4. 
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During clock cycle 5, the 67-bit contents of the P_REG 166 
continues to be applied to OUTJIUX 168 (prior to replacement by 
the least-significant product word PWO from the next following 
64- x 64-bit multiplication) and control signals received thereby 
5 cause the OUTJIUX 168 to route signals corresponding to the 32 
least-significant bits corresponding to PW2 to the PR0D_0UTo-3x 
terminals of multiplier 10 as shown by the PW2 entry in the 
PR0D_0UT row of Table II for clock cycle 5. 

Also during clock cycle 5, the 32 most-significant bit 
10 contents (prior to replacement by the just-mentioned 

next-following product word PWO) of the 67-bit P_REG 166 is 
stored in the 32-bit T_REG 174, as shown in the 9th row of Table 
II. This causes the most-significant product word PW3 to be 
stored temporarily in T_REG 174, since the 
15 next-to-the-most-signif icant product word PW2 is being currently 
generated at the PRODjOUTo-st terminals of multiplier 10 as just 
described. During clock cycle 6, the most-significant product 
word PW3 temporarily stored in TJREG 174 is conducted to OUTJIUX 
168 and control signals received thereby causing the OUTJIUX 168 
20 to route signals corresponding to PW3 to the PROD_OUTo- 3 i 
terminals of multiplier 10 as shown by the PW3 entry in the 
PR0D_0UT row of Table II for clock cycle 6. 

As shown in Table II, the operands for the next-following 
64- x 64-bit multiplication are applied to multiplier 10 at 
25 clock cycles 4 and 5, thereby permitting the product 

corresponding to these operands to be generated at the PROD 
OUTo-31 terminals beginning with clock cycle 7, by a sequence of 
steps identical with those described above, simply delayed by 
four clock cycles. In this manner, a full multiple-precision 
30 128-bit product word can be generated every four consecutive 

clock cycles by the multiplier 10 of the instant invention from 
64-bit multiplicand and multiplier words. 
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CLAIMS 

1 1. A multiplier responsive to an external clock signal and a 

plurality of external control signals which cyclically processes 
3 a multiple-precision multiplicand word and a multiple-precision 
multiplier word and cyclically generates therefrom at an output 
5 an extended-precision product word, comprising: 

means for selectively generating double-precision partial 
7 product words for said multiplicand and multiplier words; 

means responsive to said external clock signal and to 
9 predetermined ones of said external control signals connected to 
said partial product generating means for storing a selectable 
11 portion of said partial product word and for generating said 

selected portion at an output delayed by a selectable number of 
13 periods of said external clock; and 

multiplexer means responsive to predetermined ones of said 
15 external control signals connected to said partial product 

generating means, and to said storage and delay means for 
17 selectively conducting the contents of a selectable portion 

therefrom to said extended-precision multiplier output; 
19 whereby said extended-precision product word is formed of 

single-precision ones of said partial product words generated at 
21 said multiplier output during successive ones of said external 
clock periods. 



1 2. An extended-precision multiplier according to claim 1 

wherein said partial product generating means includes: 
3 • combinatorial multiplication means for generating the 

product of said single-precision portions of said multiplicand 
5 and multiplier words; 

means having first and second inputs connected to said 
7 combinatorial multiplication means for combining said product of 

said single-precision portions and the most-recently generated 
9 said double-precision partial product word into said 

double-precision partial product word; 
1 register means responsive to said external clock signal 
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connected to said combining means for storing said 
13 double-precision partial product words, upon reception of said 

external clock signal and for generating at an output said stored 
15 word; and 

a wrap-back path connecting said register means output to 
17 said second input of said combining means. 

1 3. An extended-precision multiplier according to claim 2 

wherein said combining means includes: 
3 means responsive to said double-precision product word 

received at said second input of said combining means for scaling 
5 said word by a predetermined amount; 

multiplexer means responsive to said scaled and to said 
7 unsealed double-precision product word and to a word 

corresponding to numerical zero for selectively conducting one of 
9 said words to an output; and 

means having a first input connected to said first input of 
11 said combining means and having a second input connected to said . 
multiplexer output for adding said words received at said first 
. 13 and second inputs and generating said double-precision partial 
product, being said sum. 

1 4. An extended-precision multiplier according to claim 1 

further including input register means responsive to said 
3 external clock signal for storing a single-precision 

least-significant and a single-precision most-significant 
5 portion of said multiplicand and multiplier words wherein said 

partial product generating means is connected to said input 
7 register means and generates said double-precision partial 

product words from predetermined ones of said most-significant 
9 and least-significant portions of said multiplicand and 

multiplier words. 

1 5. An extended-precision multiplier according to claim 4 

wherein said partial product generating means has a first and a 
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3 second input, said input register means includes a first 

multiplicand register for storing said least-significant 
S single-precision portion of said multiplicand word and a second 

multiplicand register for storing said most-significant 
7 single-precision portion of said multiplicand word, and wherein 

said extended-precision multiplier further includes: 
9 multiplicand multiplexer means responsive to predetermined 

ones of said external control signals connected to said first and 
11 second multiplicand registers for selectively conducting the 

contents of said registers to said first input of said partial 
13 product generating means. 

1 6. An extended-precision multiplier according to claim 4 

wherein said partial product generating means has a first and a 
3 second input, said input register means includes a first 

multiplier register for storing said least-significant 
5 single-precision portion of said multiplier word and a second 

multiplier register for storing said most-significant 
7 single-precision portion of said multiplier word, and wherein 

said extended-precision multiplier further includes: 
9 multiplier multiplexer means responsive to predetermined 

ones of said external control signals connected to said first and 
11 second multiplier registers for selectively conducting the 

contents of said registers to said second input of said partial 
13 product generating means. 

1 7. An extended-precision multiplier according to claim 1 

wherein said storage and delay means comprises: 
3 multiplexer means responsive to predetermined ones of said 

external control siganls connected to said partial product 
5 generating means for selectively generating at an output a 

least-significant or a most-significant portion of said partial 
7 product word; and 

register means responsive to said external clock signal and 
9 to predetermined ones of said external control signals connected 
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to said multiplexer means output for generating at an output said 
11 selected partial product word at the clock cycle next-following 
application of said predetermined external control signals. 

1 8. A method of cyclically processing two multiple-precision 

operand words each comprising a plurality of single-precision 

3 words and generating therefrom an extended-precision product, 
comprising the steps of: 

5 (a) forming a double-precision product of a 

least-significant one of said single-precision operand words, 

7 during an nth cycle; 

(b) temporarily storing said double-precision product formed 
9 at step (a), during an n+lst cycle; 

(c) forming a double-precision cross-product of 

11 predetermined ones of said single-precision operand words, during 

said n+lst cycle; 
13 (d) arithmetically combining said double-precision 

cross-product formed at step (c) with predetermined ones of said 
15 previously stored double-precision products arithmetically scaled 

by a predetermined amount, during said n+lst cycle; 
17 (e) temporarily storing a least-significant single-precision 

portion of said double-precision product formed in step (d), 
19 during an n+2nd cycle; 

(f) temporarily storing said double-precision word formed at 
21 step (d), during said n+2nd cycle; 

*(g) generating during an n+3rd cycle said least-significant 
23 single-precision product word temporarily stored at step (e); 

(h) incrementing n by 2 and repeating steps (c), (d) and 
25 (f), a predetermined number "m" times 0 <= m; 

(i) generating during an (n+2m)th cycle a least-significant 
27 single-precision portion of said double-precision product 

temporarily stored at step (f); 
29 (j) repeating steps (h) and (i) a predetermined number "p" 

times, 0 <= p; 

31 (k) temporarily storing a most-significant single-precision 
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portion of said double-precision product temporarily stored at 
33 step (f), during an (n+2mp) cycle; and 

(1) generating during an (n+2mp+l)st cycle said 
35 single-precision word temporarily stored at step (k). 

1 9. An extended-precision multiplication method according to 

claim 8 wherein said multiple-precision operand words are 

3 double-precision words each comprising a least-significant and a 
most-significant single-precision word portion, and wherein said 

5 value V=(L 

1 10. An extended-precision multiplication method according to 

claim 9 further including the step of initiating during said 

3 n+2nd cycle of said present multiplication another said 

extended-precision multiplication at step (a) concurrently with 

5 said present multiplication. 
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